Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug in unicode.jl/encode16 #10948

Merged
merged 1 commit into from
Apr 22, 2015
Merged

Conversation

ScottPJones
Copy link
Contributor

This fixes the bug in encode16 (called by convert and utf16), which incorrectly handles Unicode characters above 0x100000, and doesn't check for invalid characters above 0x10FFFF.
It updates the test cases as well to make sure the maximum valid Unicode character is tested.

@stevengj stevengj changed the title [Scott Paul Jones] Fixed bug in unicode.jl/encode16 Fixed bug in unicode.jl/encode16 Apr 22, 2015
@stevengj
Copy link
Member

Thanks @ScottPJones. A couple of comments:

Note that you can amend the commit by making the changes, doing git commit --amend, and then doing git push -f ..... to overwrite your old commit.

(I noticed that you pushed the change to your master branch. This is fine, but in future pull requests you probably work on a different branch, e.g. via git checkout -b utf16_fix to create a separate branch. This will make life easier for you in multiple ways.)

@stevengj stevengj added unicode Related to unicode characters and encodings backport pending labels Apr 22, 2015
else
push!(buf, UInt16(0xd7c0 + (c>>10) & 0x3ff))
push!(buf, UInt16(0xdc00 + c & 0x3ff))
throw(ArgumentError("invalid Unicode character (>0x10FFFF)"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe (> 0x10ffff), since we usually use spaces around > in Julia code, and lower-case hex is the default Julia style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for all the advice! I’ll fix this up and do the git commit --amend / git push -f

@pao
Copy link
Member

pao commented Apr 22, 2015

git already attaches your github username to the commit

And your real name in the author field as long as you've configured git config --global user.name.

@@ -55,9 +55,11 @@ function encode16(s::AbstractString)
c = reinterpret(UInt32, ch)
if c < 0x10000
push!(buf, UInt16(c))
elseif c <= 0x10FFFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower-case hex to match the surrounding style

@ScottPJones
Copy link
Contributor Author

Hmmm... I’d like to also change it so that it displays the character value when it gets the ArgumentError... I have checked that it doesn’t create a GC frame or change in-line ability, as Stefan was concerned about, but my problem is that doing $c interpolates the number in decimal, and I want it out as 0x and hex digits... (I know this is a stupid newbie question!)

@ScottPJones
Copy link
Contributor Author

Ah, never mind, found it! 0x$(hex(c))

@ScottPJones
Copy link
Contributor Author

I have now updated the tests to add all of the suggestions. Thanks!

JeffBezanson added a commit that referenced this pull request Apr 22, 2015
Fixed bug in unicode.jl/encode16
@JeffBezanson JeffBezanson merged commit 44433b9 into JuliaLang:master Apr 22, 2015
@ScottPJones
Copy link
Contributor Author

Now, I'd like to look at improving the performance of some of these functions in utf8, utf16, and utf32...
The code does Buff = Uint16[], and then a push! for every character... I did a simple test, and changing things to Buff = Array{UInt16,1}(len) and then setting each element was 5 times faster, with 1/2 the memory allocated, 1/2 the GC time and 1/2 the # of pauses...
(just a for loop of a million times, doing buff = UInt16[] ; for j=1:20 ; push!(buff,j) ; end, or buff = Array{UInt16,1}(20) ; for j=1:20 ; buff[j] = j ; end) )

I now know I messed up by pushing from my master... can you please help a newbie out, and tell me
what I need to do to get my GitHub repository (master) sync'ed back up to be the same as JuliaLang/julia...
I assume then if I do the "git checkout -b utf16_fix", I'll be fine for the next round of changes, right?

Thanks so much everybody!

@pao
Copy link
Member

pao commented Apr 23, 2015

This should be a reasonably bulletproof recipe. I assume here that you have two remotes; "julialang" is the remote that points at the JuliaLang/julia repository on GitHub, and "mine" points at ScottPJones/julia on GitHub. Substitute as needed. You can check your remotes with git remote -a.

git checkout master
git branch old-master #this is your backup branch pointer if this goes wrong
git fetch julialang master #grab the latest commits
git reset --hard julialang/master #this will modify the working copy and branch pointer to resync to master; you shouldn't normally need to do this
git checkout -b topic/make-utf16-fast #new working branch
git push mine HEAD  #push topic/make-utf16-fast to your GitHub repository

@pao
Copy link
Member

pao commented Apr 23, 2015

Oh, and don't worry about your master branch on GitHub being out of sync...in fact, you don't even need it:

git push mine --delete master #delete remote master branch
git push mine :master #same thing, oldskool syntax ("push nothing to master")

@ScottPJones
Copy link
Contributor Author

@pao I did what you said, but after that, when I tried to rebuild, I got a strange error:

 cd ../.. && /bin/sh /Volumes/Dynize/julia/deps/fftw-3.3.4-double/missing automake-1.14 --gnu dft/simd/Makefile
/Volumes/Dynize/julia/deps/fftw-3.3.4-double/missing: line 81: automake-1.14: command not found
WARNING: 'automake-1.14' is missing on your system.
         You should only need it if you modified 'Makefile.am' or
         'configure.ac' or m4 files included by 'configure.ac'.
         The 'automake' program is part of the GNU Automake package:
         <http://www.gnu.org/software/automake>
         It also requires GNU Autoconf, GNU m4 and Perl in order to run:
         <http://www.gnu.org/software/autoconf>
         <http://www.gnu.org/software/m4/>
         <http://www.perl.org/>
make[5]: *** [Makefile.in] Error 127
make[4]: *** [install-recursive] Error 1
make[3]: *** [install-recursive] Error 1
make[2]: *** [/Volumes/Dynize/julia/usr/lib/libfftw3.dylib] Error 2
make[1]: *** [install-fftw] Error 2
make: *** [julia-deps] Error 2

I actually have automake version 1.15 installed on my Mac... did something change in the last two days?
Julia built fine two days ago, from the fork that I made for fix #10919...

Thanks, Scott

@tkelman
Copy link
Contributor

tkelman commented Apr 25, 2015

some timestamp issue might be confusing autotools - try make -C deps distclean-fftw which will clear out the existing fftw build and re-do it from scratch

@pao
Copy link
Member

pao commented Apr 25, 2015

What @tkelman said is a good general rule of thumb for weird-looking dependency build problems.

@ScottPJones
Copy link
Contributor Author

Thanks @tkelman & @pao, I ended up "fixing" the problem by completely removing my local git repository and rebuilding from scratch, your advice will help save me some hours the next time that happens!

@pao
Copy link
Member

pao commented Apr 25, 2015

There's a final tier of cleanliness if make distclean doesn't work. You can remove all files not checked into the repository with git clean, which at least avoids needing to check out the repository again. I'll ask that you look up the arguments yourself so that you understand them, because this particular git function can cause dataloss.

tkelman pushed a commit that referenced this pull request Apr 26, 2015
…> 0x100000

(cherry picked from commit 054aeb0)
ref PR #10948

Conflicts:
	base/utf16.jl
	test/unicode.jl
stevengj added a commit that referenced this pull request Apr 26, 2015
@tkelman
Copy link
Contributor

tkelman commented Apr 26, 2015

backported in 8eafd07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants